Streaming Pointwise Mutual Information
نویسندگان
چکیده
Recent work has led to the ability to perform space efficient, approximate counting over large vocabularies in a streaming context. Motivated by the existence of data structures of this type, we explore the computation of associativity scores, otherwise known as pointwise mutual information (PMI), in a streaming context. We give theoretical bounds showing the impracticality of perfect online PMI computation, and detail an algorithm with high expected accuracy. Experiments on news articles show our approach gives high accuracy on real world data.
منابع مشابه
Finding Heavily-Weighted Features with the Weight-Median Sketch
We introduce the Weight-Median Sketch, a sub-linear space data structure that captures the most heavily weighted features in linear classifiers trained over data streams. This enables memory-limited execution of several statistical analyses over streams, including online feature selection, streaming data explanation, relative deltoid detection, and streaming estimation of pointwise mutual infor...
متن کاملProbability Mass Exclusions and the Directed Components of Pointwise Mutual Information
The pointwise mutual information quantifies the mutual information between events x and y from random variable X and Y . This article considers the pointwise mutual information in a directed sense, examining precisely how an event y provides information about x via probability mass exclusions. Two distinct types of exclusions are identified—namely informative and misinformative exclusions. Then...
متن کاملNormalized (Pointwise) Mutual Information in Collocation Extraction
In this paper, we discuss the related information theoretical association measures of mutual information and pointwise mutual information, in the context of collocation extraction. We introduce normalized variants of these measures in order to make them more easily interpretable and at the same time less sensitive to occurrence frequency. We also provide a small empirical study to give more ins...
متن کاملTwo Multivariate Generalizations of Pointwise Mutual Information
Since its introduction into the NLP community, pointwise mutual information has proven to be a useful association measure in numerous natural language processing applications such as collocation extraction and word space models. In its original form, it is restricted to the analysis of two-way co-occurrences. NLP problems, however, need not be restricted to twoway co-occurrences; often, a parti...
متن کاملWeakly Supervised Object Detection with Pointwise Mutual Information
In this work a novel approach for weakly supervised object detection that incorporates pointwise mutual information is presented. A fully convolutional neural network architecture is applied in which the network learns one filter per object class. The resulting feature map indicates the location of objects in an image, yielding an intuitive representation of a class activation map. While tradit...
متن کامل